feat: short-term memory system #52

cmac86 · 2026-02-06T19:00:31Z

Summary

Adds deterministic short-term memory with three storage mechanisms: auto-store from tool memory_hint, explicit memory_short tool (store/get/delete/list), and HTTP API
File-based JSON persistence on Docker caal-memory volume with 7-day default TTL
Memory context injected as LLM awareness hint (after first user message) to enable tool chaining — e.g. "is my flight on time?" triggers memory_short(get) → flight_tracker
Frontend Memory Panel (Brain icon) with entry list, source badges (tool/voice/api), inline edit, and clear all
i18n translations for en, fr, it
Fixes Docker permission issues for /app/registry_cache.json and memory persistence

Architecture

src/caal/memory/          # Package (future-proofed for long-term memory)
├── base.py               # Shared types (MemoryEntry, MemorySource, MemoryStore)
├── short_term.py          # ShortTermMemory singleton, file persistence, TTL
└── __init__.py

src/caal/integrations/memory_tool.py  # MemoryTools mixin (memory_short function_tool)

Three storage paths:

Tool hint — n8n workflows return memory_hint in response → auto-stored
Explicit tool — user says "remember my flight is UA1234" → memory_short(store)
HTTP API — POST /memory for external systems

Context injection serves as awareness layer — the LLM sees what's in memory so it knows to chain tools (e.g. pull email from memory → send via Gmail), but retrieval still goes through the tool for verification.

Test plan

API: store, get, list, delete, clear via curl
Explicit tool: "remember my flight number is UA1234" → stored and retrievable
Tool chaining: "is my flight on time?" → memory_short(get) → web_search
Cross-tool chaining: "email Ashley" → memory_short(get) → gmail(send)
Clean greeting: memory not announced on session start
Frontend panel: entries display with source badges, timestamps, TTL
Inline edit: pencil icon → textarea → save
Docker persistence: entries survive container restart via /app/data volume

🤖 Generated with Claude Code

Adds deterministic short-term memory with three storage mechanisms: - Auto-store from tool responses via memory_hint field - Explicit memory_short tool (store/get/delete/list actions) - HTTP API endpoints for external access Backend: src/caal/memory/ package with file-based JSON persistence, singleton pattern, TTL support, and context injection into LLM. Frontend: Memory Panel UI with Brain icon button, entry list, detail modal, and clear all functionality. Includes i18n translations for en, fr, it. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

- Change default TTL from 24h to 7 days (604800s) - Allow tools to specify custom TTL in memory_hint: - Simple value: uses default 7d TTL - {"value": ..., "ttl": seconds}: custom TTL - {"value": ..., "ttl": null}: no expiry Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Replace linear execute→stream→retry with a loop that supports multi-step tool chaining. Model can now: call tool A → get result → call tool B → get result → generate text response. Previously, after one tool execution the code tried to stream a text response. If the model wanted to chain (call another tool), it produced 0 text chunks, triggering a retry without tools that crashed Ollama (tool references in messages but no tools registered). New flow: - Loop non-streaming chat() calls (max 5 rounds) - Each round: if tool_calls → execute → loop back - When no tool_calls → yield content or stream final response - Safety fallback: _strip_tool_messages converts tool messages to plain text if Ollama still crashes on the streaming path Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

…o context - Deduplicate identical tool calls within a single round (same name + args) - Accumulate tool names/params across chained rounds for frontend indicator - Keep tool indicator showing after response (don't clear when tools were used) - Include tool call arguments in ToolDataCache context injection Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Memory file was failing with permission denied because /app is owned by root. Now uses CAAL_MEMORY_DIR=/app/data (the caal-memory volume) and entrypoint ensures directory is writable by agent user. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Prevents the LLM from using memory data in the initial greeting. Memory context is now skipped when there are no user messages yet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…haining Context injection helps the LLM know what's in memory so it can chain tools correctly (e.g. memory_short → flight_tracker). Without it, the model may skip memory and go to other tools directly. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…missions - Memory detail modal now has pencil icon to edit values in-place - Add registry_cache.json symlink to entrypoint.sh (same pattern as settings.json) to fix permission denied on /app/registry_cache.json Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Ministral-3 recommended instruction temperature is 0.15. The old 0.7 default was overriding the Modelfile setting on every API call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Sophist-UK · 2026-02-09T06:54:08Z

You have probably thought about this now thoroughly that I have but...

Don't we need several types of memory? Short term for day to day transient data - like package numbers? Task memory for repeating cron tasks? Skills memory for new repeatable skills i.e. saved context for a shorter way to repeat a command? Knowledge - when Caal researched something, shouldn't it retain this knowledge? Isn't this how openclaw becomes so useful?
What technologies should be used for memory? Redis? Vector db? MCP?
AI hallucinations are partly due to lack of knowledge, but also due to false knowledge. What do we need to ensure that the memory is correct and factual?
Forgetting - in the long run forgetting is probably as important as remembering otherwise Caal is likely to grind to a halt. Some memories are only needed for a predefined period. Some are personal and unique, some are learned off the internet and can be forgotten and then relearned. Some are accurate when they are remembered but the world changes around them and they become inaccurate. Each memory probably needs metadata so that Caal knows when to forget.
Context - when I want to remember something I don't want my brain to be flooded with every memory however irrelevant - but equally I don't want to be starved of memory because I can't give it a precise index id. So memory needs to be fuzzy contextual. And some context can be precise i.e. if I am asking about a delivery, you only want details of all recent packages not my entire history - but others might be more general (like "I asked you about something to do with travel last week" when the history includes taxis, flights, hotels, reviewing car models, reviewing holiday destinations etc. and you got it wrong because it was 2.5 weeks ago).

IMO memory is likely to be the difference between Siri and openclaw i.e. limited Vs limitless.

In other words, it's complicated. But there has to be a lot of research on this, so probably not necessary to reinvent the wheel.

cmac86 · 2026-02-09T23:59:50Z

@Sophist-UK, Great comment and you're touching on something I’ve been thinking about a lot.

You're right that memory has layers. Short-term is step one - this PR covers transient data like flight numbers, package tracking, things that are useful for a few days and then expire. TTL-based, simple, predictable.

Long-term memory is planned as well. Thinking graph-based (something like Graphiti) with embeddings for contextual retrieval. This is where "Corey prefers morning flights" or relationships between contacts and preferences would live. Your points about forgetting and fuzzy contextual search are spot on for that layer - metadata-driven expiry and hybrid search (semantic + keyword) are likely where that heads. Trick is to retrieve that information when necessary and inject it.

Where CAAL's approach differs from what you might be picturing is the role memory plays. In CAAL's architecture, the LLM is a router - it decides which tool to call and with what parameters, and then deterministic n8n workflows execute. Memory serves that routing. When you say "is my flight on time?" the model needs to know which flight so it can call the right tool with the right parameters. It's not accumulating capability or learning new skills - memory serves as data to get enough context to make better routing decisions.

Skills in CAAL are n8n workflows. They go through review (automated + human) before they're live. The model can build new workflows (we showed this in a previous video) but it has to be prompted to do so, and the method is through calling another workflow that uses a larger LLM to generate the workflow. That boundary is intentional - it's what lets an 8B model be reliable and secure. The model doesn't need to be smart enough to self-improve, it needs to be smart enough to route.

So to your five points: 1 and 2 - yes, layered memory with graph + embeddings is on the roadmap. 3 - agreed, and scoping what the model can do with memory (route, not execute) helps bound that risk. 4 - absolutely, TTL is built into this PR and long-term will need smarter expiry. 5 - contextual retrieval is key for the long-term layer.

Appreciate the thoughtful input. This is exactly the kind of discussion that helps shape the architecture.

Any experience with Graphiti or similar?

cmac

cmac86 and others added 10 commits February 3, 2026 14:16

fix(memory): mark context as internal so LLM won't read it aloud

323fb32

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix(memory): only inject context after user has spoken

fdde93d

Prevents the LLM from using memory data in the initial greeting. Memory context is now skipped when there are no user messages yet. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Change default temperature from 0.7 to 0.15

ca937f0

Ministral-3 recommended instruction temperature is 0.15. The old 0.7 default was overriding the Modelfile setting on every API call. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: short-term memory system #52

feat: short-term memory system #52

Uh oh!

cmac86 commented Feb 6, 2026

Uh oh!

Sophist-UK commented Feb 9, 2026 •

edited

Loading

Uh oh!

cmac86 commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: short-term memory system #52

Are you sure you want to change the base?

feat: short-term memory system #52

Uh oh!

Conversation

cmac86 commented Feb 6, 2026

Summary

Architecture

Test plan

Uh oh!

Sophist-UK commented Feb 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cmac86 commented Feb 9, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Sophist-UK commented Feb 9, 2026 •

edited

Loading